{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Swinging up a pendulum using DDPG\n",
    "Let's learn how to implement the DDPG for the swinging up pendulum task using stable\n",
    "baselines. First, let's import the necessary libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import gym\n",
    "import numpy as np\n",
    "\n",
    "from stable_baselines.ddpg.policies import MlpPolicy\n",
    "from stable_baselines.common.evaluation import evaluate_policy\n",
    "from stable_baselines.common.noise import NormalActionNoise, OrnsteinUhlenbeckActionNoise, AdaptiveParamNoiseSpec\n",
    "from stable_baselines import DDPG"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create the pendlum environment using gym:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "env = gym.make('Pendulum-v0')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Get the number of actions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_actions = env.action_space.shape[-1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We know that in DDPG, instead of selecting the action directly, we add some noise using the Ornstein-Uhlenbeck process to ensure exploration. So, we create the action noise as:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "action_noise = OrnsteinUhlenbeckActionNoise(mean=np.zeros(n_actions), sigma=float(0.5) * np.ones(n_actions))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Instantiate the agent:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "agent = DDPG(MlpPolicy, env, verbose=1, param_noise=None, action_noise=action_noise)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we can train the agent as usual:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<stable_baselines.ddpg.ddpg.DDPG at 0x7f4f1801f7f0>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "agent.learn(total_timesteps=25000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After training, we can evaluate our agent by looking at the mean rewards:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "mean_reward, n_steps = evaluate_policy(agent, agent.get_env(),\n",
    "n_eval_episodes=10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also have a look at how our trained agent performs in the environment:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "state = env.reset()\n",
    "while True:\n",
    "    action, _states = agent.predict(state)\n",
    "    next_state, reward, done, info = env.step(action)\n",
    "    state = next_state\n",
    "    env.render()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After training the agent, we can also look at how our trained agent swings up the\n",
    "pendulum by rendering the environment. Can we also look at the computational graph of\n",
    "DDPG? Yes! In the next section, we will learn how to do that."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Viewing the computational graph in TensorBoard\n",
    "\n",
    "With stables baselines, it is easier to view the computational graph of our model in\n",
    "TensorBoard. In order to that, we just need to pass the directory where we need to store our\n",
    "log files while instantiating the agent as shown below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "agent = DDPG(MlpPolicy, env, verbose=1, param_noise=None,action_noise=action_noise, tensorboard_log=\"logs\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, we can train the agent:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARNING:tensorflow:From /home/sudharsan/anaconda3/envs/universe/lib/python3.6/site-packages/stable_baselines/common/base_class.py:1082: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<stable_baselines.ddpg.ddpg.DDPG at 0x7f4f10154898>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "agent.learn(total_timesteps=25000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After training, open the terminal and type the following command to run the TensorBoard:\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`tensorboard --logdir logs`"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
